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The modern household deals with a large number of documents in printed or 
hand written forms that should be filed and occasionally retrieved. Misplacement of 
a document, loss or damage to a document that is the only copy, or inaccessibility to 
a document when and where required s may have grave consequences. 

Presently, an individual wanting to do more than simply filing the hard 
copies, can scan documents and store the image files on his own memory device 
such as a hard drive of a personal computer. However, the files take up valuable 
memory. Tn addition, the process of indexing the image files to facilitate retrieval 
requires extra effort on the part of the individual. For example, the individual can 
create folders and subfolders for different types of documents, or the individual can 
group certain documents on a diskette which has the appropriate label marked on it, 
or the individual can implement a full featured database structure. Such 
environments for local storage and management of scanned documents require that 
the user be capable of handling sophisticated information technologies, performing 
25 periodic backups, dealing with redundancy and reliability aspects, etc. 

For the high-end, sophisticated enterprise customers, there are companies 
which offer solutions to document management (see for example www,kofoc.corn\ . 
and companies which offer internet based, centralized document storage and 
retrieval services (see for example www.lmagfcsiio.com V 
30 For individual users, there are some non-web based personal archiving 

systems (see for example www.mvdocuments.com) that offer limited options to the 
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individual consumer. These systems offer an environment for orderly storage on the 
user's local disk, with limited accessibility over the Internet. 

There are, al£o, some web sites targeted at individual consumers, which allow 
centralized stlorage and retrieval of on line photo albums (see for example 
www.cartogra.com). 
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object of the invention to provide a system and method for archiving 
at a remote storage site, 
ahother object of the invention to provide a system and method for 
personal documents for storage. 

another object of the invention to provide a system and method for 
images of personal documents, 

another object of the invention to provide a system and method for 
archived images of personal documents. 

another object of the invention to provide these capabilities on a 
lo consumers, having all or most of the algorithms and software 
s installed at a central facility (facilities) where they are centrally 
maintained by the personnel of the service provider, 
another object of the invention to provide an archiving apparatus. 



yet 



OF THE INVENTION 



ding to the present invention there is provided a method for preparing 
personal document for remote archiving, including the steps of: 
least one index word with the at least one personal document; and 
at least one file related to the at least one personal document to a 
a location. 

ding to the present invention there is also provided, a method for 
iving at least one personal document, including the steps of: receiving 
site at least one file related to the at least one personal document; 
least one index word with the at least one file; and storing the at least 
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According to the present invention, there is still further provided, a method 
for requesting the retrieval of at least one remotely archived file related to a personal 
document, including the steps of: specifying at least one index word; and receiving 
the at least ons file associated with the at least one specified index word. 

According to the present invention, there is still further provided, a method 
of retrieving at least one remotely archived file, reJated io at least one personal 
document, including the steps of: receiving at least one index word associated with 
the at least oile remotely archived file; retrieving the at least one remotely archived 
file from storage; and transmitting the retrieved at least one file. 

According to the present invention there is provided a system for remotely 
archiving at least one personal document, including: at least one processing element 
for associating at least one index word with at least one file related to the at least 
one personal document; and at least one storage element for storing the at least one 
file based on ";he associating. 

According to the present invention, there is also provided, a system for 
preparing at feast one personal document for remote archiving, including: at least 
one communication interface; and at least one device for specifying at least one 
index word associated with the at least one document, coupled to the at least one 
communication interface. 

According to the present invention, there is still further provided, a system 
for retrieving at least one remotely archived file related to at least one personal 
document, including at least one storage element for storing the at least one file 



related to the 
searching the 
Accor 



at least one personal document; and at least one searching element for 

at least one storage element for the at least one file. 

ding to the present invention, there is still further provided, a system 



for requesting the retrieval of at least one remotely archived file related to at least 
one personal document, including: at least one communication interface; and at least 
one device coupled to the at least one communication interface for specifying at 
least one mdeix word associated with the at least one document. 

According to the present invention, there is provided a method of providing 
a remote archiving service to individual users by an application service provider, 
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$ registering for remote archiving service; users using 
d users being billed based on their usage, 
present invention, there is provided an apparatus for 
document to a remote archiving system, including a 
odule coupled to the controller. Optionally, an indexing 
to the controller. 



BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 

system according to an 

FIG. 2a shows a g 
to one embodiment of the 

FIG, 2b shows a 
to another embodiment oft 

FIG. 2c shows a generalized 
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FIG. 2e shows a generalized 
to another embodiment of tpe 

FIG. 2f shows a 
to another embodiment 

FIG, 2g shows a g 
to another embodiment oft 

FIG 3 shows a 
embodiment of the present 

FIG. 4a shows a sc 
keywords as embedded data 



sin described, by way of example only, with reference to 
, wherein: 

is a generalized block diagram of the general architecture of the 
embodiment of the present invention; 

ejieraiized block diagram of a scanning system according 
invention ; 

generalized block diagram of a scanning system according 
he invention ; 

block diagram of a scanning system according 
invention ; 

generalized block diagram of a scanning system according 
invention ; 

block diagram of a scanning system according 
invention ; 

generalized block diagram of a scanning system according 
of t|ie invention ; 

eheralized block diagram of a scanning system according 
ie invention ; 

bl|ock diagram of an archiving device according to an 
invention; 

ieme for associating index words with documents using 
according to an embodiment of the present invention; 
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FIG 7 



FIG. 4b shows a scheme for associating index words with documents using 
highlighted embedded data according to an embodiment of the present invention; 

FIG. 4c shows a scheme for associating index words with documents using 
location-sensitive embedded data according to an embodiment of the present 
5 invention; 

FIG. 3a shows a scheme for associating index words with documents using 
manually ke3 r ed-in auxiliary data according to an embodiment of the present 
invention; 

FIG. 3b shows a scheme for associating index words with documents using 
3 0 auxiliary data scanned separately from the original document according to an 
embodiment of the present invention; 

FIG. 5c shows a scheme for associating index words with documents using 
voice annotation as auxiliary data according to an embodiment of the present 



shows a scheme for associating index words with documents that is 



menu driven *tccording to an embodiment of the present invention; 



shows a scheme for associating index words with documents that is 
off-line according to an embodiment of the present invention; 

FIG 8| illustrates a retrieval method according to an embodiment of the 
20 present invention; 

FIG. 5 a shows a retrieval system according to an embodiment of the present 
invention; 

FIG. 9b shows a retrieval system according to an embodiment of the present 
invention; 

25 FTG. 9 c shows a retrieval system according to an embodiment of the present 

invention; 

FIG. S«d shows a retrieval system according to an embodiment of the present 
invention; 

FIG, 9e shows a retrieval system according to an embodiment of the present 
30 invention; 

FIG. 9f shows a retrieval system according to an embodiment of Ihe present 
invention; and 
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FIG. 9jg shows a retrieval system according to an embodiment of the present 
invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
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invention is of a system and method of capturing, indexing, 
all types of personal documents. The method and/or system 
a novel archiving apparatus. Specifically, the present invention is 
sonal documents at a remote location. 
Personal Documents" is used hereinafter to refer to documents that are 
dividual use (rather than company use) , and include at least one 
character. Principles and operation of a personal documents archival 
niethod according to a preferred embodiment of the present invention 
understood with reference to the drawings and the accompanying 
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g now to Figure 1 5 there is shown the general architecture of an 
system 10, System 10 includes three main components: One or more 
elements 12, a central management and storage element CMSE 14 and one 
retrieval elements 16, A communication interface 22 provides a connection 
or more capture elements 12 and CMSE 14. A communication 
provides a connection between CMSE 14 and one or more retrieval 



more capture elements 12 are configured to perform a scanning 
d an index associating function 20 (i.e. specifying one or more indices 
to the scanned document), It should be evident that certain documents 
to be scanned to produce an image file as a user may receive the image 
cocuments in other ways. For example, a user may download an image 
statement directly from his bank's web site. Alternatively, a user 
soft-copy version of a statement. In such cases, scanning function 18 
a more general ''obtaining image file" function. 

btaining image file" function may be performed manually, i.e. when 
■iquested by the user, or alternatively may be done automatically at a 
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user defined schedule (for example, a weekly download of a bank statement), or 
may be done upon certain user defined condition being met (for example, download 
of a stock portfolio on broker's web site when a change is identified in a certain 
stock). The downloaded page is directly sent to storage at the CMSE 14, or 
5 alternatively to the user for later transmittal to CMSE 14, Preferably, if Lhe 
downloaded page is directly sent for storage at CMSE 14, pre-assigned index words 



defined condition Eire sent with the page. The identification of the 



occurrence of the user defined condition may necessitate an appropriate monitoring 
function to bt± incorporated in CMSE 14 or at the capture element site. It should be 
noted that the invention is not bound by any particular scheme for associating 
indices with documents. 

Reverting now to Fig, 1, CMSE 14 is configured to perform processing, 
archiving and query function 26. CMSE 14 performs both central management and 
storage. The device for central management is preferably located at the same 
15 location as the remote storage device, i.e, the physical location of the memory. 
However, the device for central management can also be located at a distance from 
the storage element as long as there is a data link between the central management 
device and storage element. Moreover, the centra] management function can be 
distributed over more than one device and/or location, and the storage function can 
20 be distributed over more than one element and/or location. 

One or more retrieval elements 16 are configured to perform retrieval 
function 30. One or more retrieval elements 16 may or may not be identical with one 
or more captixe elements 12, and communication interfaces 22 and 28 may or may 
not be identical to each other. 

Communication interfaces 22 and 28 are shown here to connect to the 
Internet 24 and thereby to CMSE 14, Alternatively, legacy interfaces 23 provide 
; a fax gateway 13 and/or a voice gateway 15 to CMSE 14. 



interfaces via 



Communication interfaces 22 and 28 can connect to internet 24 through 
POTS (plain old telephone system), ISDN, xDSL f cable, wireless, satellite, etc. 
However, communication means other than the internet for intercommunication 
between one or more capture elements 12 and/or one or more retrieval elements 16 
are within the scope of the invention. 
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Integration to other communication means may be implemented either by 
complying with other standards and protocols, or through gateway elements at 
CSME 14, For example, a document image may be sent to the CSME 14 by fax, 
where it will be processed through fax gateway 13 and routed Lo the IP network for 
further processing. 

Alternatively, voice gateway 15 allows for storage of voice (including 
converted voice-to- text) files instead of or in addition to the storage images of a 
document. Voice gateway 15 performs the function of converting voice (possibly 
receiving it tirough a telephony system, for example) to text for storage. The voice 
includes oral recitations of at least part of the contents of the document (the 
document including at least one alphanumeric character). A speech to text product is 
exemplified by IBM's Via Voice Gold and Simply Speaking Gold products. This 
scenario is Envisioned as being particularly suitable for situations in which a 
personal document includes a small amount of simple data - for example an index 
card comprising of a name (which might be orally spelled out character by 
character) and a telephone number. Alternatively, the voice is digitized and archived 
as a digital voice file (i.e. no conversion to text performed), which may be played 
back later in the voice retrieval mode. The voice (including converted voice-to text) 
files can be irdexed in a later session as in Figure 7, 

li should be evident that the configuration of communication interfaces 22 
and 28 may differ based on the communication means in use. Tn some of the 
examples outlined above communications interfaces 22 and/or 28 might comprise 
dedicated hardware and software (as in the case of a modem and appropriate drivers 
in a PC, for example), but in other cases the whole communications interface 22 
and/or 28 might be an integral, embedded part of the device being used (as in the 
> enabled cellular phone using a WAP protocol, for example). 



case of a web 



Figures 2a-2f show typical yet not exclusive configurations for scanning 
function 18 performed by one or more capture elements 12, In Figure 2a an 
archiving device 32, including its own controller 42, is directly connected to 
communication interface 22. Alternatively device 32 can have a communications 
interface 45 built in (sec Figure 3) , In Figure 2b, a scanner 33. which does not need 
to include its own controller, is connected as a peripheral to a personal computer 
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(PC) 34 which is connected to communication interface 22. In figure 2c, scanner 33 
is connected as a peripheral to set top box 36 which is connected to communication 
interface 22, J.n Figure 2d, scanner 33 is connected as a peripheral to game console 
38 which is connected to communication interface 22. In figure 2e, scanner 33 is 

5 connected as ] peripheral to personal digital assistant (PDA) 40 which is connected to 
communicaiicn interface 22. In figure 2f 5 scanner 33 is connected to a wireless 
device 4JL such as a cellular phone. In figure 2g, scanner 33 is connected to a 
Network Appliance 10L In Figures 2b-2g ? the connection between scanner 33 and 
devices 34 to 41 or 101 may be a physical connection through wires or the 

i o connection may be wireless, for example using the Bluetooth protocol. 

Figure 3 shows a block diagram of archiving device (or as sometimes 
termed helov/ ''apparatus for capturing and feeding a document to a remote 
archiving system") 32. The purpose of the device is to streamline and ease the use 
of the system from the user's point of view., mainly in the "input" phase - i.e. the 

15 scanning/associating indices operation. To achieve this goal, device 32 includes at 
least a scanner 46 and all the functions (hardware and software) essential for 
straight forward connection to the Internet - i.e. no need for any additional unit 
between the scanner (now built-in one) and Internet 24 (such as a computer, a game 
console, a cellular phone etc.). 

20 Additionally, device 32 may include other optional elements such as: 1) a 

display 50 (may be used to view the scanned image, to prompt the user for action, to 
display menus, for displaying retrieved documents in retrieve mode, etc.), 2) a built- 
in microphone 51 and associated voice processing capability to perform voice 
digitization and possibly local voice to text conversion, 3) an input module 47, used 

25 for inputting mdices and/or simple commands such as "scan document", "scan index 
sheet", "get audio input from the microphone", <£ add input from microphone to list 
of permitted index words", '"show last inputted index word on the display" , etc., 4) 
an output module 44 - if a display is not implemented, a simple output module 
(comprising for example of few LED's) may be used to indicate basic status data 

so such as "scanner fault", v index word not recognized as a legitimate one", 'Index 
word OK", " 4 Ao more index words allowed", etc. 
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It should be noted that if both input module 47 and display 50 are 
implemented, a variety of functions may be performed at the local level. For 
example the scanned image might be displayed, before transmission to CMSE 14, to 
enable the user lo check the scanning quality, the results of local OCR/ICR/Voice- 
5 to-text processing (if done) may be displayed to the user, etc, In the latter case local 
controller 42 may also use the display unit to prompt the user for some activity - for 
example confirming that the index word derived is accurate, or lo indicate a 
conversion failure, etc. 

The main modules of this device (some of which are optional) are presented 
l o in Figure 3 and include: 

1, embedded controller 42, performing all the control functions as 
well the processing that is taking place at the user's site, 

2, communications module 45- together with the controller, handles 
all the communications with CMSE 14 via Internet 24. This 

nclude an adapter 21 for the specific transport media 
used - i.e. cable modem, xDSL modem, POTS, cellular, etc, 
Communicatipns interface 45 might have all the adaptors modules 
ith only one of adaptor modules 21 selected as the 
during device setup. In other embodiments, 
communications module 45 is not built in and device 32 connects 
to communications interface 22 (as in Figure 2a) 

3, Scanner module 46 - performs scanning of paper inputs - either 
the scanned document or an index sheet. The scanner module, on 
its own or together with controller 42, might or might not do some 

25 processing ch the documents being fed, such as image 
enhancement, OCR, etc. 

4, Audio processing 51 - comprises of a microphone and voice 
digitization circuitry. The audio processing module, on its own or 
together with the controller, might or might not do some 

30 processing on the digitized voice, such as voice to text conversion. 
The microphone in some embodiments is used for associating 
indices through voice annotation. 
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module 50- may be used for a variety of functions, as 
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utput 44 modules - used for simple user interaction, 
above. To support menu-driven associating of 
documents, the input module might also include a 
select specific index words out of a list displayed 
module. 

interface 48- interface to an external device such as a 
e[xample for obtaining a hard copy of a retrieved 

versed in the art will readily appreciate that the invention is not bound 
modules depicted in Fig. 3. 
indices which a user specifies can either be common to all users or 
for a particular user, A user may specify individualized keywords 
up he. registering (by phone, fax, internet, regular or electronic mail, 
arcfcrival system 10, or at a later date as part of a maintenance 
or through other means (by phone, fax, internet, regular or 

J, etc), 

ion to the specification of individualized keywords, during sign-up or 
communication, the amount of allocated storage space and 
options such as voice retrieval or surnmary retrieval may be defined for 



certain 



embodiments, the index words which a user can specify are 
hese cases, CMSE 14 either uses the same index words specified by a 
and storing the documents, or CMSE 34 includes the capabilities 
user-specified indices into a more limited number of words used for 
storing the document. CMSE 14 may query a user to ensure the user 
positive identification of the index words and/or of the mapping by 
to get further instructions in case the CSME does not recognize a 
£ word as a legitimate one for the particular user. This CMSE/user 
take place either in real time or in a later session, in which case the 
retrieval element 16- will be presented with a list of problem issues 
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manual Intervention. Another example of a problem situation might be 
document does not initially have any index data associated with it - 
discussion later on. 
usually specifies more than one index word for a certain document, 
e and efficient retrieval capabilities (for example, a March phone 
stored with the associated indices "March", "bill" and "phone", and 
I "March" and "bills", or "bills" and "phone", or "March" and "bills" 
). The corresponding index words that CMSE 14 uses to index and 
need not be identical to the indices used by the user. In certain 
more than one user-specified index such as "bank", "credit union", 
itution" may map onto one CMSE 14 index word such as "financial", 
also true in certain embodiments. For example a user may prefer to 
index "credit card", whereas based on the credit card number (for 
t(|>matically extracted by the CMSE from a scanned image using form 
optical character recognition (OCR) technologies), CMSE 14 may 
file while adding specific credit card index words such as VISA, 
etc. An example of form processing technology is FormWare fM 
by Captiva Software Corporation, headquartered in San Diego, CA. 

is exemplified by CharacterEyes* manufactured by Ligature, Ltd, 
in. Jerusalem, Israel. 

3 for associating index words with documents described in the 

phs may be grouped into 3 major categories: 
- where the index information is derived from the 
itself (i.e. only the file related to the personal 
stored needs to be sent to CMSE 14 ) 
- where the index information is provided 
the file related to the personal document, for 
g data keyed in, digitized voice sent to CMSE 14 for 
processing, written words scanned and sent to 
OCR/ICR processing, etc 
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election — where the user is presented with a list of 
ndices, and selects a subset using a pointing 



CMSE 



uncover 



OJEL 
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CMSE 



? 4a-4c show sample embedded data schemes where the index words 
in a document 46 to be scanned, 
^ure 4a, original document 52 including keywords is scanned to give 
54 which is sent to CMSE 14, When CMSE 14 receives scanned 
SE 14 recognizes pre-defined keywords so that CMSE 14 is able to 
correct index/indices with the archived scanned image 54 . Keyword 
techniques 53 include OCR, and intelligent character recognition 
xlemplary product for TCR is Cleqs 7 manufactured by gentriqs Software, 
altered in Eltville, Germany. It should be evident that under the scheme 
the indices specified by the user must be limited to allowed (pre- 
(i.e. keywords) either common to all users or previously 
for that user, 

4b, index words receive a manual highlight 56 in original document 
14 receives scanned image 54 of original document 52, CMSE 14 
identification techniques 58 based on image processing (for example, 
s package such as "Digital Image Processing" by WOLFRAM 
of Champaign, 1L) to locate the index fields, and subsequently uses 
uch as OCR or ICR to recognize the indices, 

4c illustrates a scheme where index words are located in a specific 
of the document (i.e. a certain field in the document is filled in and 
interpreted as an index word), CMSE 14 performs field identification 
such as image processing, form processing, OCR, or ICR to recognize 
the indices. The field scheme could be used, for example, when 
forms - for example for bank statements. Once the standard 
identified, the "month" field (as an example) - which is generally in the 
each statement - can be automatically extracted for index purposes, 
products that may provide the technological foundation for such 
AFSPRO and FREEDOM, developed by Top image Systems Ltd. 
, Israel. It should be noted that with technologies such a$ those 
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in FREEDOM, the forms need not necessarily be of a standard 
known to the system, 

5a to 5c show typical schemes in which auxiliary data is used for 
ices with documents - i.e. where index words, are transmitted to 
araiely from the original document. It should be evident that auxiliary 
transmitted immediately prior to or immediately following the original 
the auxiliary data can be transmitted at a different session, for 
may log on to the remote location, request to see recently scanned 
documents, and subsequently enter and send the auxiliary indexing 
example is a user who asks a third party to scan and transmit a 
the user, and the user retrieves the document at a later time and 
words with iL 

embodiments, CMSE 14 maintains a list of documents with which 
are associated, in order to alert the user and enable off-line (i.e. in a 
not the same as the one in which scanning the document takes 
ure 7) user assistance including but not limited to corrective action in 
will enter indices using the appropriate scheme for associating 
document 

case of a downloaded file sent directly to CMSE 14, the index words 
a) need not be separately entered by a user but may be generated 
based on the scheduled event which triggered the download or based 
(i.e. web site) from which the file was downloaded. However, if the 
^ - he may add, delete and/or modify the indices in an off-line session. 
5 a shows an auxiliary data scheme using manual keying. Indices are 
entered) via a data entry device 62 such as a personal computer 
etc), game console, personal digital assistant, or archiving device 
module 47, etc. The auxiliary data is transmitted to CMSE 14 where 
1 to index a scanned image. 

be evident that distinct entered codes can indicate whether the 
refers to one or more images sent immediately prior, to be sent 
or sent at in a different session. 



user 



mouse, 



1+7/91 *Z8"0N £0:9L LO, ll/W 



Z0+7601Z £ 2Z6 HHVd S NHOD 0~IOHNI3£i 



30 



Figure 



words with a 



20 



for user assis 
will include 

25 



\5 



5b illustrates a second scheme using auxiliary data to associate index 
scanned image, A manually filled scanning sheet 64 is scanned to give 
a scanned image 66. Scanned image 66 is transmitted to CMSE 14 and CMSE 14 
performs keyword identification techniques 68 such as OCR or ICR to recognize 
and correctly index the separately scanned original document. Preferably, a user can 
write the words in a legible hand on any available background (e.g, loose-leaf, index 
card, envelope, label, etc.) for use as scanning sheet 64. 

Figure- 5c shows a scheme where voice annotation is used as auxiliary data, 
A user indicates (i.e. speaks) index words into a microphone connected to a device 
which can connect to communication interface 22 such as a personal computer, 
personal digital assistant, etc, or a user indicates index words to archiving device 32 
with microphpne module 51. The voice is converted to digitized voice 70 so that it 
can be transmitted as packets over the data network. When CMSE 14 receives the 
communication, CMSE 14 performs a speech recognition algorithm 72, converting 
speech to text so as to obtain indices. 

Clearly, errors may occur in the processes described above with reference to 
figures 5a-5c. For example, the user might perform an error while typing in an 
index word in the manual keying scheme, CMSE 14 might fail to correctly 
recognize a printed index word in the indexing sheet scheme, or the CMSE might 
fail in performing the text to speech in the voice annotation scheme (alternatively, 
the user might perform an error - for example, writing or saying an index word that 
has not been defined for usage), In these cases the CMSE will prompt the operator 
;ance including but not limited to corrective action. Such prompting 
presenting the user with the unrecognized index word that resulted 
from the CMSE processing, along with - if possible - the input that was used to 
derive the erroneous result (for example, the image of the word that was OCR'ed, or 
the digitized voice that was converted to text). The user's corrective action will 
usually consist of deleting the erroneous word, and possibly entering a correct one 
(or more than one), Tn some implementations, the user might be required to review 
and confirm tine interpretation of indices by CMSE 14, even if no evident problems 
were identified during the process. 
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The user assistance/corrective action can take place either in real time (if 
feasible, i.e. :wo way communications with capture element 12 is supported and 
enabled) or at a later session upon connection to retrieval element 16. 

Firlli-4 6 shows a scheme for associating index words with a document that is 
menu driven. A display and pointing device 71 7 for example a personal computer, 
game consoli, PDA, or archiving device 32 with input module 47 and display 
module 50, etc. is used to display menu choices to the user and to allow the user to 
indicate (select) appropriate indices to be associated with the scanned image of his 



7 illustrates the process of associating index words with a document 



at a later session (or off-line, as referred to in this document). A user retrieves stored 
image 73 (the retrieval process is described below) and associates index: words with 
the document! e using manual keying, indexing sheet, voice annotation, or a menu as 
described abdve with reference to figures 5a 5 5b, 5c, and 6. CMSH 14 transmits the 
15 image files wir.h no index woi'ds associated with them either automatically upon 
identification of the user, or in response to a specific request. 

Once associating indices function 20 (Figure 1) is performed, the indices are 
stored by CM'SE 14 as part of the meta-data associated with the document, and are 
used to identify the required document during retrieval operation 30. Hereinafter, 
20 meta-data is defined as data which describes other data. In the context of the 
invention, rnota-data includes but is not limited to a list of indices and time of 
storage of a document. 

The retrieval process will now be described with reference to Figure 8. Once 
a document and its associated meta-data are stored, a user can access a document 
25 and view the digitized image using any web-enabled device that has an appropriate 
display. Such device might be a PC, a set top box connected to a TV, a game 
console connected to a display (TV or other), a web enabled phone, an Internet 
appliance, etc[ Note that the displayed image may be adapted to the specific device 
being used - for example a colored document may be displayed in B&W ? resolution 
30 may be degracjled to lit the characteristics of the display element, etc. 
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To retrieve the desired document, the user specifies a list of indices generally 
through the browser of a web enabled device (step 76), however, the requested file 
can be specified by indices entered from a phone keypad (DTMf tones) or cellular 
telephone through voice gateway 15 or through other input devices, 

5 Subsequently, in step 7$, CMSE 14 searches through the meta-data of all 

stored files looking for all documents that have associated with them all the indices 
specified in the user's request , Tf more than one stored document has ail the 
specified indices as part of its associated meta-data, the user is presented with basic 
information (such as date and time of storage, first line of the document, etc) on all 

10 relevant docunents. The user may then select a particular document for retrieval, or 
may step through the whole list, viewing each document sequentially (step 80), 

As ev dent from the description above, in the case of retrieval - unlike in 
that of capture - two way communications between retrieval element 16 and CMS£ 
14 is usually necessary. However, for embodiments where a browser is used with 

15 any web enabled device for retrieval element 16 7 the implementation of retrieval 
clement 16 is straight forward and does not require any special hardware or 
software. 

The document is then retrieved by CMSE 14 (step 82). If the user (through 

I 

retrieval elenient 16) elects for image retrieval mode, the image file is sent over IP 
20 (step 84). 

figures 9a-9f show sample retrieval systems for accessing documents whose 
image Hies are stored in CMSE 14. 

Figure 9a shows a display 90 controlled by an embedded controller with 
display 90 direcQy connected to communication interface 28. Preferably display 90 
25 is identical to display module SO which is part of archiving apparatus 32 (Figure 3), 

Figure 9b shows a retrieval system including a monitor 91 connected 
through a personal computer 94 to communication interface 28- 

Figure 9c illustrates a retrieval system using a television 92 connected 
ihro ugh a set ^op box 95 to communication interface 28, 
30 In Figure 9d, a display 93 is connected to communication interface 28 

through a game console 96. 
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9e illustrates a retrieval system using a personal digital assistant 98 
communication interface 28* 

le 9f illustrates another embodiment of the retrieval system, using a 
cellular phone 100 , 

9g illustrates another embodiment of the retrieval system, using an 
iance 102 . 
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In caise retrieval element 16 does not contain (or is not connected to) a hard 
copy element such as a printer, then ~ for retrieval elements 16 which are web 
enabled - it will be possible to send the image of the retrieved document to a device 
where it may be printed, for example, as an e-mail attachment sent to a full featured 
PC workstation. This operation will be performed as part of the basic capabilities of 
the user's environment. 

in Figures 9b to 9e, connections between peripherals 91, 92 ? or 93 and PC 
94, set top box 95, and game console 96, may be wired connections (e.g. cables), or 
wireless (for example using the Bluetooth protocol), 

Tn some of the examples outlined above communications interface 28 might 
include dedicated hardware and software (as In the case of a modem and appropriate 
drivers, for example), but in other cases the whole communications interface might 
be an integral, embedded part of the device being used (as in the case of a WAP 
enabled cellular phone, for example). 

In general, the requested image file is transmitted by CMSE 14 to the 
location frori where the request initiated. However, retrieval element 16 (for 
example ihroagh a web browser) may allow a user to specify that a stored image file 
be sent to another location, it is evident that retrieval element 16 must be coupled to 
or incorporated into an output device such as a display, printer, television, web 
enabled cellular telephone, etc. Alternatively, or in addition to, retrieval element 16 
may be capable of locally storing the received image file for subsequent access. 

As an- alternative or in addition to outputting and/or storing the retrieved 
document, retrieval element 16 may define the indices to be used by CMSE 14 - and 
once the document is found, instruct CMSE 14 to send the retrieved image to 
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another device - for example, as an attachment sent to an e-mail address, ot as a fax 
sent to a certain number (through fax gateway 13). 

According to preferred embodiments of the present invention, retrieval 
include voice retrieval and summary retrieval. 

voice retrieval mode, the document contents (after OCR/ICR -step 86) 
are lexl-to-voice converted, and "played" to the user as an audio (voice) "file". It 
should be evident that a file that is text to voice converted for playing back to a user 
may have originally been a voice file (converted voice to text at the capture stage) or 
may have originally been an image file at the capture stage. In other embodiments 
10 where the file was initially digitized and stored as a digital voice file (see above), no 
text to voice conversion is necessary. This playback can be performed either using 
an analog audio output (for example, to be transmitted to the user over POTS or 
cellular phone) (step 87) or as digitized output (step 88) (for example, packets of 
VoIP - voice over IP - relayed over a network to a computer where they will be 
15 converted to analog form and played to the user). Text to speech technology is 
exemplified by ReaiSpeak* manufactured by Lemhout & Hauspie Speech Products, 
NV, leper, Belgium. 

An example for usage of voice retrieval mode might be a person, "listening 
to' 7 (= reading;) through a phone a shopping list scanned into the system by his or her 
20 spouse. Another example might be a case in which the input is not a scanned paper 
document bui: rather a downloaded file - for example, a user driving a car while 
using his cellular radio to "listen to" (— read) his updated stock portfolio which was 
automatically downloaded from a web site to his on-line archive. 

In summary retrieval mode CMSE 14 may be configured to automatically 
25 prepare pre-d ^fined summaries based on contents of stored documents. An example 
might be monthly summaries of bank statements. To do this CMSE 14 will use data 
derived from contents of certain fields in individually stored documents. 

In certain embodiments, the system described above is implemented by an 
application service provider (ASP), Optionally, users will be billed based on usage 
30 of storage mtedia, number of times data is accessed, amount of memory used by 
users, special, functions, etc. In such embodiments, the process taking place in 
CMSE 14 also includes a billing system. 
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While the invention has been described with respect to a limited number of 
embodiments J it will be appreciated that many variations, modifications and other 
applications of the invention may be made. 
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